Glutamine metabolism-related genes predict the prognostic risk of acute myeloid leukemia and stratify patients by subtype analysis

Background Acute myeloid leukemia (AML) is a genetically heterogeneous disease in which glutamine (Gln) contributes to AML progression. Therefore, this study aimed to identify potential prognostic biomarkers for AML based on Gln metabolism-related genes. Methods Gln-related genes that were differentially expressed between Cancer Genome Atlas-based AML and normal samples were analyzed using the limma package. Univariate, least absolute shrinkage, selection operators, and stepwise Cox regression analyses were used to identify prognostic signatures. Risk score-based prognostic and nomogram models were constructed to predict the prognostic risk of AML. Subsequently, consistent cluster analysis was performed to stratify patients into different subtypes, and subtype-related module genes were screened using weighted gene co-expression network analysis. Results Through a series of regression analyses, HGF, ANGPTL3, MB, F2, CALR, EIF4EBP1, EPHX1, and PDHA1 were identified as potential prognostic biomarkers of AML. Prognostic and nomogram models constructed based on these genes could significantly differentiate between high- and low-risk AML with high predictive accuracy. The eight-signature also stratified patients with AML into two subtypes, among which Cluster 2 was prone to a high risk of AML prognosis. These two clusters exhibited different immune profiles. Of the subtype-related module genes, the HOXA and HOXB family genes may be genetic features of AML subtypes. Conclusion Eight Gln metabolism-related genes were identified as potential biomarkers of AML to predict prognostic risk. The molecular subtypes clustered by these genes enabled prognostic risk stratification.


Background
Acute myeloid leukemia (AML) is a fatal cancer characterized by increased self-renewal and uncontrolled proliferation of malignant bone marrow stem cells, accompanied by infection, hemorrhage, and organ infiltration [1].A small percentage of cases have been determined to be affected by causative factors such as chemotherapy or chemical exposure, but the vast majority develop due to chromosomal abnormalities and gene mutations [2].As a genetically heterogeneous disease, more than 97% of AML cases have recognizable somatic mutations [3].Therefore, cytogenetic markers are currently the most important indicators for the risk stratification and treatment of patients with AML [4].However, AML is still associated with relatively poor survival, and recent data have reported a 5-year overall survival rate of 21% for AML, similar to that of solid organ malignancies with a high fatality rate [5].This difficulty lies in the fact that the prognosis of AML is closely related to the genetic characteristics of the disease, leading to variability in treatment and prognosis.This study summarized a series of gene mutations, including tumor protein p53 (TP53), nucleophosmin, fms-related receptor tyrosine kinase 3, and CCAAT enhancer binding protein alpha, that may serve as potential prognostic markers and targets for AML [6,7].However, the complexity and specificity of each patient's genetic profile have forced researchers to continually identify novel prognostic markers to predict an individual's response to treatment, thereby enabling effective personalized treatment.
Metabolic reprogramming is a key manifestation of AML and is closely associated with clinical diagnosis, risk stratification, and targeted drug development [8].Cellular metabolism in AML is genotype-specific and is accompanied by epigenetic changes, somatic mutations, and activation of downstream cancer-promoting pathways [9].Amino acid metabolism plays a role in regulating redox homeostasis and maintaining cell proliferation [10].Glutamine (Gln), a non-essential amino acid, is the most abundant amino acid in human blood.However, when the energy requirement for the rapid proliferation of cancer cells is not met, Gln can be converted to be conditionally essential and contribute to AML cell proliferation [11].Removal of Gln induces apoptosis in AML cells by inhibiting the mechanistic target of rapamycin complex 1 pathway [12].Therefore, screening for molecular targets of Gln metabolism may help develop novel AML treatment strategies and improve patient prognosis.
However, no systematic study has been conducted to comprehensively screen biomarkers of AML from the perspective of Gln metabolism to predict prognosis.Therefore, we conducted a series of bioinformatics analyses to screen prognostic signatures from Gln metabolism-related genes to predict prognostic risk and stratify patients to further identify their genetic and immune characteristics.The potential prognostic markers identified in this study may help optimize treatment choices for patients, reduce prognostic risk, and deepen the biological understanding of AML.

Data search and information
Gene expression profiles of AML were obtained from the Cancer Genome Atlas (TCGA) [13].Based on the available clinical information, AML samples with prognostic information and survival time over 30 days, totaling 149 cases, were enrolled in this study.In addition, 337 whole blood samples from the Genotype-Tissue Expression (GTEx) database were used as normal controls.These 486 samples were used as the training set for subsequent analyses.
The validation set, GSE71014, was obtained from the Gene Expression Omnibus (GEO) database [14].It was detected on the GPL10558 Illumina HumanHT-12 V4.0 expression bead chip and comprised 104 AML samples.Finally, 96 AML samples with a survival time of more than 30 days were included in this study.

Screening of differentially expressed genes (DEGs) related to gln metabolism
In GeneCards, 704 protein-coding genes with correlation coefficients greater than eight for Gln metabolism were defined as Gln-related genes, of which 639 were matched to the training set.By comparing the gene expression profiles of these 639 genes between AML and normal groups, Gln-related DEGs (Gln-DEGs) were selected using the limma package (Version 3.52.4)[15], at thresholds of adj.p < 0.05 & |log 2 fold change (FC)| > 2.

Genetic mutation of Gln-DEGs
The somatic mutation maf files of AML, processed using Mutect software, were downloaded from TCGA.The oncoplot function of the R package maftools was used to plot the mutation waterfall of the TOP10 mutated genes in the Gln-DEGs.

Identification of prognostic signatures to construct the risk score-based model
Based on the Gln-DEGs expression matrix, univariate Cox regression analysis in the R survival package was HOXA and HOXB family genes may be genetic features of AML subtypes.Keywords Acute myeloid leukemia, Glutamine, Prognostic model, Molecular subtype, Immune infiltration employed to screen for genes significantly correlated with prognosis at the expression level with a cutoff of p < 0.01.The least absolute shrinkage and selection operator (LASSO) Cox regression analysis in the R. glmnet package (version 1.2) [16] was used to further screen key genes by penalization parameter tuning through 10-fold cross-validation.Finally, the prognostic signatures were screened using stepwise Cox regression analysis in the R. survminer package (version 0.4.9) to construct a prognostic model according to the risk score, which was calculated as follows: In this formula, exp indicates the expression level of prognostic signatures, while β represents the stepwise regression coefficient of this gene.The high-and lowrisk groups were bounded by the median risk score in the training and validation cohorts.The association between risk score and the actual prognosis was assessed using Kaplan-Meier (KM) analysis in the survival package of R3.6.1 (version 2.41-1).In contrast, the predictive efficacy of the risk score for prognosis at 1, 3, and 5 years was estimated using receiver operating characteristic (ROC) curves.

Screening of independent prognostic factors to establish a nomogram model
This study further included clinical information (age, race, sex, and FAB subtype) as well as risk scores in the univariate and multivariate Cox regression analyses to screen for independent prognostic factors with a p-value less than 0.05.These factors were used to construct a nomogram model to evaluate the predictive relationship between these factors and prognosis using the rms package in R (version 5.1-2) [17].KM and ROC curves were used to validate the predictive efficacy of the nomogram.

Immuno-analysis of prognostic signatures
The CIBERSORT algorithm [18] was used to estimate the infiltration abundance of key immune cells in high-and low-risk groups.The cor function in R was used to evaluate the relationship between the prognostic signatures and immune cells by calculating the Spearman correlation coefficient.Expression data for 36 immune checkpoint genes and 15 human leukocyte antigen (HLA) family genes were also extracted to compare their expression differences between high-and low-risk groups using the Wilcoxon test.

Identification and comparison of molecular subtypes of AML
In this study, we used the R. ConsensusClusterPlus package (Version:1.58.0) [19] to perform a consistent clustering analysis of AML samples using hierarchical clustering based on Spearman correlation coefficients to cluster and identify different AML subtypes.Similarly, the KM curve was used to compare the survival differences between the two risk groups.The abundance of immune cell infiltration in each subtype sample was calculated using CIBERSORT and was compared between the two subtypes.

Screening of key genes related to AML subtypes using weighted gene co-expression network analysis (WGCNA)
This study performed WGCNA on the top 20% of genes with an absolute deviation from the median of expression value screening in AML samples.WGCNA utilizes module eigengenes to differentiate modules.By calculating the correlations between modules and modules as well as between modules and traits, modules that were highly correlated with the traits were screened, and key genes were selected from the modules.This study used subtypes as phenotypic traits, and key genes were selected from modules related to AML subtypes using the WGCNA package in R (version 1.71) [20].

Protein-to-protein interaction (PPI) analysis
The STRING database (version 11.5) [21] was used to predict the interactions between gene-encoded proteins.In this study, PPI analysis was carried out on subtyperelated module genes based on the STRING database, with the species as homo sapiens and the parameter set as an interaction score of 0.4.

Construct a risk score-based model to predict prognostic risks
Based on the expression data and stepwise regression coefficients of the eight prognostic signatures, the risk score for each sample in the training and validation sets was calculated to establish the prognostic models.After defining the high-and low-risk groups in the training set, it was suggested that patients who died were more frequently distributed in the group with a high prognostic risk (Fig. 3A).The KM curves also confirmed a significant difference in survival probability between the two groups (Fig. 3A).ROC curves were then plotted to assess the sensitivity and specificity of the model based on the training set in predicting AML prognostic risks.The areas under the curve (AUCs) of 1-, 3-, and 5-year ROC were 0.85, 0.846, and 0.875, respectively (Fig. 3A), suggesting the predictive potential of this prognostic model.Furthermore, the model was reconstructed using the validation set, and the results were consistent with the above findings.The validation set-based prognostic model also significantly distinguished the survival status of samples under different risk groups and accurately predicted AML prognostic risks (Fig. 3B).

Screening for independent prognostic factors and constructing a nomogram to predict survival
By integrating the clinical information and risk scores, independent prognostic factors were selected using univariate and multivariate Cox regression analyses.Age (HR = 1.020, 95% Cl = 1.005-1.036,p = 0.010) and risk score (HR = 4.905, 95% Cl = 3.063-7.856,p < 0.001) were determined to have prognostic independence (Fig. 4A,  B).These two factors were incorporated into the nomogram model to predict survival (Fig. 4C).The fitting curves suggested that the overall survial predicted by the model converged with the actual survival (Fig. 4D).The KM curve further demonstrated that patients with higher risk scores had significantly worse prognoses (Fig. 4E).The AUCs of the ROC curves were all over 0.84, indicating that the nomogram was highly sensitive and specific in predicting the 1-, 3-, and 5-survival statuses (Fig. 4F).

Evaluation of immune landscape
In this study, the CIBERSORT algorithm was used to quantify immune cells in all samples.By comparing the high-and low-risk groups, we found five types of immune cells (CD8 T cells, γδT cells, resting NK cells, activated NK cells, and resting mast cells) were significantly different infiltrated between the two groups, and the proportions of these five cells were all significantly decreased in the high-risk group (Fig. 5A).The correlation between the expression levels of the eight prognostic signatures and the infiltration levels of these cells was shown in Fig. 5B.Furthermore, the majority of immune checkpoint genes and HLA family genes were found to have significant differences in expression between the high-and lowrisk groups (Fig. 5C and D), suggesting different immune statuses between the two groups.

Identification and comparison of AML molecular subtypes
Unsupervised cluster analysis was performed based on the expression data of the eight prognostic signatures in the AML samples.By setting the range of K values from 2 to 6, the optimal K = 2 was selected (the curve was more stable) (Fig. 6A), and two clusters were obtained (Fig. 6B).By comparing the survival status between clusters 1 and 2, the KM curve revealed that patients in cluster 1 were prone to a favorable prognosis (Fig. 6C).The Sankey diagram also showed that patients who died were more distributed in Cluster 2, which comprised a greater proportion of patients with high prognostic risks (Fig. 6D).Using CIBERSORT, we identified six types of immune cells showing differences in infiltration between the two subtypes (Fig. 6E).Among them, CD8 T cells, γδT cells, and resting mast cells were significantly decreased in infiltration in cluster 2, which was more distributed in the high-risk group.

Screening of hub genes using WGCNA
To screen for key genes further, we used subtype as a trait to construct a WGCNA network.When the power was 4, the network approximated a scale-free network distribution (Fig. 7A).Based on the dynamic tree-cutting algorithm, we set the minimum number of genes in each module to 30 and finally obtained seven modules (Fig. 7B).By calculating the relationships between the modules and subtypes, the pink module was found to have the strongest correlation with both subtypes (Fig. 7C).Next, 98 genes in the pink module were included in the PPI analysis.Based on the STRING database, a PPI network comprising 48 genes and 161 interaction pairs was constructed (Fig. 7D).Among them, the homeobox A cluster (HOXA) and homeobox B cluster (HOXB) family genes were considered to contribute more to AML subtypes because they had more degrees of connection in this PPI network.

Discussion
Gln inhibition correlates with the level of glutaminase activity, and glutaminase inhibitors can suppress AML cell growth and induce cell apoptosis and differentiation of disease subtypes.Therefore, inhibition of Gln uptake is an attractive new strategy for treating AML [12,22].Key targets in the Gln metabolic pathway, such as protooncogenes, glutamic-pyruvic transaminase 2, and solute carrier family 1 member 5, are regulated by insulin-like growth factor 2 mRNA-binding protein 2 in an m6Adependent manner to promote AML development and stem cell self-renewal [23].Therefore, based on the 704 genes related to Gln metabolism, we screened 387 genes with significant differences in expression between AML and normal controls.Using univariate, LASSO, and stepwise Cox regression analyses, eight genes (HGF, ANG-PTL3, MB, F2, CALR, EIF4EBP1, EPHX1, and PDHA1) were identified as a prognostic signature.The prognostic model constructed using the eight genes accurately predicted the prognostic risk in patients with AML, even in the validation cohort.Based on the expression levels of these genes, this study also categorized AML into two molecular subtypes, in which the prognostic risk of patients in cluster 2 was more adverse.Using the molecular subtype as a trait, this study also screened subtype-related module genes through WGCNA, among which the HOXA and HOXB family genes may be the key genetic features of disease subtypes.These results provided a valuable reference for further understanding the molecular mechanisms of these potential markers in AML and their impact on prognosis.
In the present study, we screened 387 DEGs associated with Gln expression in AML samples.Most somatic variations in these genes were missense mutations.Among these, the TOP10 genes with the highest mutation frequencies in AML were IDH, TP53, WT1, IDH1, KRAS, PTPN11, ACACB, APC, NPC1, and QRICH2.IDH, a mutated enzyme in the citric acid cycle, leads to the production of the oncogenic metabolite R-2-hydroxyglutarate.This arrests the differentiation of hematopoietic stem cells, leading to the promotion of leukemia  [24].TP53-mutated AML is a unique subtype of AML with a poor prognosis [25].A longitudinal study tracking the evolution of mutations demonstrated that TP53 mutations represent primary mutational events in chemotherapy or radiation therapy-induced AML [26].A meta-analysis suggested that WT1 and TP53 mutations exhibit a mutually exclusive tendency in AML [27].WT1 has been reported to function as an oncogene and tumor suppressor in AML [28][29][30].IDH1 mutations occur in 6-10% of patients with AML [31], and inhibitors such as Ivosidenib and Azacitidine are currently available for this mutation [32].A previous study found that KRAS mutations were associated with poor prognosis in AML [33].Mutations in PTPN11 and KRAS confer resistance to combinations and multiple venetoclax combinations [34].However, our understanding of the roles of ACACB, APC, NPC1, and QRICH2 in AML is limited.This study identified mutated genes in AML that could contribute to treating AML.
Of the eight prognostic signatures identified in this study, seven genes, excluding MB, were significantly upregulated in AML.HGF, a hepatocyte growth factor, was confirmed to be upregulated in AML samples and cells and to promote the tumor malignancy of AML cells through targeted binding with miR-204 [35].The mutation load of CALR alleles increases during the transformation from primary thrombocythemia to AML [36].Activation of EIF4EBP1, eukaryotic translation initiation factor 4E binding protein 1, promotes AML cell proliferation and disease progression [37].High expression of EPHX1, a microsomal epoxide hydrolase 1, is significantly associated with high recurrence and low overall survival rates in patients with AML [38], consistent with our findings, suggesting that EPHX1 may promote disease progression in AML.ANGPTL3 suppresses the expression of Ikaros, a key regulator of hematopoietic cell differentiation, thereby promoting the expansion and stemness of HSCs.It stimulates cancer growth by promoting angiogenesis, cell proliferation, and migration [39].Recent studies have proposed that PDHA1, a key cuproptosis gene, is crucial for reprogramming glucose metabolism in tumor cells [40].One study found that PDHA1 was abnormally overexpressed in AML, consistent with our findings [41].The core of thrombosis is thrombin, a product of F2, which participates in coagulation.The upregulation of F2 may increase the risk of thrombotic bleeding complications in patients with AML [42].MB transports and stores oxygen in muscle cells, and a decrease in MB in patients with AML may be related to impaired heart function [43].The predictive effects of these eight genes on AML prognosis of AML is initially proposed in this study.A prognostic model constructed using eight genes accurately predicted prognostic risk in patients with AML.
Furthermore, this study found that the infiltration levels of CD8 T cells, γδT cells, resting NK cells, activated NK cells, and resting mast cells were significantly reduced in the high-risk prognostic group.Meanwhile, it is also found that the expression level of EPHX1 was negatively correlated with CD8 T cells, γδT cells, resting NK cells, and activated NK cells, and the expression of EIF4EBP1 was negatively correlated with CD8 T cells and γδT cells.Immune deficiency in AML is reflected in T cells and NK cells, where CD8 + T and diseased γδ T cells exhibit an exhausted state in AML at diagnosis [44].However, the relationship of EPHX1 and EIF4EBP1 expression with these immune cells has not been investigated in AML.Therefore, this study proposed that EIF4EBP1 and EPHX1 contribute to immunodeficiency and further affect the prognostic survival status of AML patients by negatively modulating the infiltration level of CD8 T cells and γδT cells.However, this hypothesis requires further investigation.
Based on consistent cluster analysis, this study identified two molecular subtypes of AML, with cluster 2 tending to have a poorer prognosis.The samples in cluster 2 were also more distributed in the high-risk group.Furthermore, CD8 T cells, γδT cells, and resting mast cells, which were proportionately downregulated in the highrisk group, were similarly infiltrated in reduced abundance in cluster 2. These results suggested an increased prognostic risk for patients in cluster 2, suggesting that stratification based on disease subtypes can identify survival probability in AML.WGCNA further identified the module genes associated with the subtype, and the PPI network suggested that the HOXA and HOXB family genes may be key genetic characteristics of the disease subtype.Gene amplification, deep deletion, and alterations in the mRNA expression of HOXA have been found in approximately 18% of AML samples, and HOXA3-10 serves as a potential AML therapeutic target and prognostic marker [45].A relevant bioinformatics analysis revealed that six HOXA and three HOXB genes were significantly underexpressed and hypermethylated in AML, accompanied by favorable cytogenetic profiles [46].Small-molecule inhibitors targeting the menin-lysine methyltransferase 2 A interaction restored normal HOXA expression in mutant AML, with therapeutic implications for AML patients [47].In addition, HOXA7, HOXA9, and HOXA11 are associated with AML risk status and prognosis [48].To further support the above findings, this study suggested that HOXA3-7, 9-11, and HOXB2-9 expression were closely associated with the prognostic risk predicted based on eight prognostic signatures.The expression levels of these markers can help stratify patients with AML and predict their prognostic risks.
However, this study had some limitations.First, the predictive performances of these eight prognostic signatures must be independently validated using a cohort of clinical samples.Moreover, the relationship between these prognostic signatures and immune cell infiltration, as well as their potential regulatory mechanisms, needs to be explored through in vivo and in vitro experiments.In the future, we will continue to explore the potential of these eight genes as prognostic targets in AML and investigate the impact of their expression levels on the efficacy of immunotherapy.

Conclusion
Based on 704 genes related to Gln metabolism, this study carried out differential expression analyses, as well as univariate, LASSO, and stepwise Cox regression analyses, and identified eight prognostic signatures.Prognostic and nomogram models constructed based on the expression levels can accurately identify the prognostic risk of AML.These prognostic signatures clustered patients with AML into two molecular subtypes with different prognostic risk patterns and immune profiles.Among the subtype-related module genes, the HOXA and HOXB family genes may be key genetic features of the AML subtypes.

Fig. 2
Fig. 2 Screening of prognostic signatures using the univariate, LASSO, and stepwise Cox regression analyses.A. The univariate Cox analysis of 26 Gln-DEGs.B. Differences in gene Expression between AML and whole blood samples.****p < 0.0001.C. Left panel: Distribution of the LASSO coefficients.Right panel: Selection of lambda min based on the likelihood deviation of the LASSO coefficient distribution.D. Eight prognostic signatures identified using stepwise Cox regression analysis

Fig. 3 Fig. 4
Fig. 3 Construction a risk score-based model in the training set (A) and the verification of the model in the validation cohort (B).Left panel: risk score distribution and survival status of all AML samples; Middle panel: KM curve displaying the difference in survival probability between high-and low-risk groups; Right panel: ROC curves showing the ability of the model to predict 1-, 3-, and 5-year survival prognosis

Fig. 5 Fig. 6
Fig. 5 Comparison of infiltration of immune cells and expression of immune-related genes between the high-and low-risk groups.A Differences in the infiltration of 22 types of immune cells between the high-and low-risk groups.B: Correlation between immune cell infiltration and expression of eight prognostic signatures.C Box plot showed differences in the expression of immune checkpoint genes between the two groups.*p < 0.5; **p < 0.01; ***p < 0.001; ****p < 0.0001.D. The box plot presented the expression differences in HLA family genes between the two groups.*p < 0.5; **p < 0.01; ***p < 0.001; ****p < 0.0001

Fig. 7
Fig. 7 Screening of hub genes related to AML subtypes using WGCNA and PPI analysis.A. The scale-free fit index of β at soft thresholds of 1-20.B. The Genes were categorized into seven modules using hierarchical clustering.C Correlation between modules and subtypes.D. PPI networks comprising 48 genes and 161 interaction pairs